A Class of Submodular Functions for Document Summarization
نویسندگان
چکیده
We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity. Critically, our functions are monotone nondecreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. When evaluated on DUC 2004-2007 corpora, we obtain better than existing state-of-art results in both generic and query-focused document summarization. Lastly, we show that several well-established methods for document summarization correspond, in fact, to submodular function optimization, adding further evidence that submodular functions are a natural fit for document summarization.
منابع مشابه
Large-Margin Learning of Submodular Summarization Methods
In this paper, we present a supervised learning approach to training submodular scoring functions for extractive multi-document summarization. By taking a structured predicition approach, we provide a large-margin method that directly optimizes a convex relaxation of the desired performance measure. The learning method applies to all submodular summarization methods, and we demonstrate its effe...
متن کاملMulti-document Summarization via Budgeted Maximization of Submodular Functions
We treat the text summarization problem as maximizing a submodular function under a budget constraint. We show, both theoretically and empirically, a modified greedy algorithm can efficiently solve the budgeted submodular maximization problem near-optimally, and we derive new approximation bounds in doing so. Experiments on DUC’04 task show that our approach is superior to the bestperforming me...
متن کاملTowards Abstractive Multi-Document Summarization Using Submodular Function-Based Framework, Sentence Compression and Merging
We propose a submodular function-based summarization system which integrates three important measures namely importance, coverage, and non-redundancy to detect the important sentences for the summary. We design monotone and submodular functions which allow us to apply an efficient and scalable greedy algorithm to obtain informative and well-covered summaries. In addition, we integrate two abstr...
متن کاملMonotone Submodularity in Opinion Summaries
Opinion summarization is the task of producing the summary of a text, such that the summary also preserves the sentiment of the text. Opinion Summarization is thus a trade-off between summarization and sentiment analysis. The demand of compression may drop sentiment bearing sentences, and the demand of sentiment detection may bring in redundant sentences. We harness the power of submodularity t...
متن کاملSemi-extractive Multi-document Summarization
In this thesis, I design a Maximum Coverage problem with KnaPsack constraint (MCKP) based model for extractive multi-document summarization. The model integrates three measures to detect important sentences including Coverage, rewards sentences in regards to their representative level of the whole document, Relevance, focuses to select sentences that related to the given query, and Compression,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011